ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group98a.txt / 000144_icon-group-sender _Tue Mar 17 16:26:20 1998.msg < prev next >

Wrap

Internet Message Format | 2000-09-20 | 2KB

Return-Path: <icon-group-sender> Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239]) by baskerville.CS.Arizona.EDU (8.8.7/8.8.7) with SMTP id QAA08544 for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Tue, 17 Mar 1998 16:26:20 -0700 (MST) Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM) id AA17517; Tue, 17 Mar 1998 16:26:19 -0700 Message-Id: <199803172152.OAA16672@orpheus.gemini.edu> From: swampler@noao.edu (Steve Wampler) Date: Tue, 17 Mar 1998 14:52:34 MST In-Reply-To: Mark Evans's mail message of Mar 13, 4:06pm. X-Mailer: Mail User's Shell (7.2.3 5/22/91) To: icon-group@optima.CS.Arizona.EDU Subject: Re: Letter Probabilities Errors-To: icon-group-errors@optima.CS.Arizona.EDU Status: RO Content-Length: 1502 On Mar 17 at 10:16am, Mark Evans writes: } I think that we have here one of those classic execution-time vs. } memory-space tradeoffs that they teach about in computer science } classes. } } Mark } } Paul Abrahams wrote: } > } > Using the probabilities to construct a weighted string and then } > selecting a random character from the string does have one unaesthetic } > property, in my book: the length of the string grows exponentially with } > the precision of the probabilities. For that reason I'd opt for a } > binary search, which takes 5 probes no matter what the precision. } > } > I wonder if there's another way of attacking this problem. I'm not sure I understand this. The upper bound on the weighted string is the length of the input text itself. In fact, the input text itself is the *perfect* weighted string! There is no need to reduce the string to probabilities and then expand the probabilities back into a different string - all this does is (a) slow things down and (b) reduce the 'accuracy' of the probabilities. ((b) isn't all that important when producing 'random' text, I suppose.) Now, for *really, really* large input samples, this becomes inefficent (memory-wise). It's certainly not an issue for the 20K character lengths I've seen mentioned here. I regularly process 10-MB strings with Icon. -- Steve Wampler - swampler@gemini.edu [Gemini 8m Telescopes Project (under AURA)] The gods that smiled at your birth are now laughing openly. (Fortune Cookie)